76 research outputs found

    PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

    Full text link
    Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/.Comment: Accepted to RSS 201

    AN ANALYSIS OF BOTTOM-UP ATTENTION MODELS AND MULTIMODAL REPRESENTATION LEARNING FOR VISUAL QUESTION ANSWERING

    Get PDF
    A Visual Question Answering (VQA) task is the ability of a system to take an image and an open-ended, natural language question about the image and provide a natural language text answer as the output. The VQA task is a relatively nascent field, with only a few strategies explored. The performance of the VQA system, in terms of accuracy of answers to the image-question pairs, requires a considerable overhaul before the system can be used in practice. The general system for performing the VQA task consists of an image encoder network, a question encoder network, a multi-modal attention network that combines the information obtained image and question, and answering network that generates natural language answers for the image-question pair. In this thesis, we follow two strategies to improve the performance (accuracy) of VQA. The first is a representation learning approach (utilizing the state-of-the-art Generative Adversarial Models (GANs) (Goodfellow, et al., 2014)) to improve the image encoding system of VQA. This thesis evaluates four variants of GANs to identify a GAN architecture that best captures the data distribution of the images, and it was determined that GAN variants become unstable and fail to become a viable image encoding system in VQA. The second strategy is to evaluate an alternative approach to the attention network, using multi-modal compact bilinear pooling, in the existing VQA system. The second strategy led to an increase in the accuracy of VQA by 2% compared to the current state-of-the-art technique

    Dynamic Multi-Heuristic A*

    Full text link
    Abstract—Many motion planning problems in robotics are high dimensional planning problems. While sampling-based motion planning algorithms handle the high dimensionality very well, the solution qualities are often hard to control due to the inherent randomization. In addition, they suffer severely when the configuration space has several ‘narrow passages’. Search-based planners on the other hand typically provide good solution qualities and are not affected by narrow passages. However, in the absence of a good heuristic or when there are deep local minima in the heuristic, they suffer from the curse of dimensionality. In this work, our primary contribution is a method for dynamically generating heuristics, in addition to the original heuristic(s) used, to guide the search out of local minima. With the ability to escape local minima easily, the effect of dimensionality becomes less pronounced. On the theoretical side, we provide guarantees on completeness and bounds on suboptimality of the solution found. We compare our proposed method with the recently published Multi-Heuristic A * search, and the popular RRT-Connect in a full-body mobile manipulation domain for the PR2 robot, and show its benefits over these approaches. I

    Task-oriented planning for manipulating articulated mechanisms under model uncertainty

    Full text link
    Abstract — Personal robots need to manipulate a variety of articulated mechanisms as part of day-to-day tasks. These tasks are often specific, goal-driven, and permit very little bootstrap time for learning the articulation type. In this work, we ad-dress the problem of purposefully manipulating an articulated object, with uncertainty in the type of articulation. To this end, we provide two primary contributions: first, an efficient planning algorithm that, given a set of candidate articulation models, is able to correctly identify the underlying model and simultaneously complete a task; and second, a representation for articulated objects called the Generalized Kinematic Graph (GK-Graph), that allows for modeling complex mechanisms whose articulation varies as a function of the state space. Fi-nally, we provide a practical method to auto-generate candidate articulation models from RGB-D data and present extensive results on the PR2 robot to demonstrate the utility of our representation and the efficiency of our planner. I

    SeekNet: Improved Human Instance Segmentation via Reinforcement Learning Based Optimized Robot Relocation

    Full text link
    Amodal recognition is the ability of the system to detect occluded objects. Most state-of-the-art Visual Recognition systems lack the ability to perform amodal recognition. Few studies have achieved amodal recognition through passive prediction or embodied recognition approaches. However, these approaches suffer from challenges in real-world applications, such as dynamic objects. We propose SeekNet, an improved optimization method for amodal recognition through embodied visual recognition. Additionally, we implement SeekNet for social robots, where there are multiple interactions with crowded humans. Hence, we focus on occluded human detection & tracking and showcase the superiority of our algorithm over other baselines. We also experiment with SeekNet to improve the confidence of COVID-19 symptoms pre-screening algorithms using our efficient embodied recognition system

    Technique of mass multiplication of Tenobracon deesae (Cam.) hymenoptera: Braconidæ for use against sugarcane and maize borers

    Get PDF
    This article does not have an abstract

    The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis

    Full text link
    We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological literature on sentiment, we unveil distinct conceptualizations of this term in domains such as finance, government, and medicine. Our study exposes a lack of explicit definitions and frameworks for characterizing sentiment, resulting in potential challenges and biases. To tackle this issue, we propose an ethics sheet encompassing critical inquiries to guide practitioners in ensuring equitable utilization of SA. Our findings underscore the significance of adopting an interdisciplinary approach to defining sentiment in SA and offer a pragmatic solution for its implementation.Comment: This paper has been accepted and will appear at the EMNLP 2023 Main Conferenc

    Global gene expression analysis of the mouse colonic mucosa treated with azoxymethane and dextran sodium sulfate

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chronic inflammation is well known to be a risk factor for colon cancer. Previously we established a novel mouse model of inflammation-related colon carcinogenesis, which is useful to examine the involvement of inflammation in colon carcinogenesis. To shed light on the alterations in global gene expression in the background of inflammation-related colon cancer and gain further insights into the molecular mechanisms underlying inflammation-related colon carcinogenesis, we conducted a comprehensive DNA microarray analysis using our model.</p> <p>Methods</p> <p>Male ICR mice were given a single ip injection of azoxymethane (AOM, 10 mg/kg body weight), followed by the addition of 2% (w/v) dextran sodium sulfate (DSS) to their drinking water for 7 days, starting 1 week after the AOM injection. We performed DNA microarray analysis (Affymetrix GeneChip) on non-tumorous mucosa obtained from mice that received AOM/DSS, AOM alone, and DSS alone, and untreated mice at wks 5 and 10.</p> <p>Results</p> <p>Markedly up-regulated genes in the colonic mucosa given AOM/DSS at wk 5 or 10 included Wnt inhibitory factor 1 (<it>Wif1</it>, 48.5-fold increase at wk 5 and 5.7-fold increase at wk 10) and plasminogen activator, tissue (<it>Plat</it>, 48.5-fold increase at wk 5), myelocytomatosis oncogene (<it>Myc</it>, 3.0-fold increase at wk 5), and phospholipase A2, group IIA (platelets, synovial fluid) (<it>Plscr2</it>, 8.0-fold increase at wk 10). The notable down-regulated genes in the colonic mucosa of mice treated with AOM/DSS were the peroxisome proliferator activated receptor binding protein (<it>Pparbp</it>, 0.06-fold decrease at wk 10) and the transforming growth factor, beta 3 (<it>Tgfb3</it>, 0.14-fold decrease at wk 10). The inflammation-related gene, peroxisome proliferator activated receptor γ (<it>Pparγ </it>0.38-fold decrease at wk 5), was also down-regulated in the colonic mucosa of mice that received AOM/DSS.</p> <p>Conclusion</p> <p>This is the first report describing global gene expression analysis of an AOM/DSS-induced mouse colon carcinogenesis model, and our findings provide new insights into the mechanisms of inflammation-related colon carcinogenesis and the establishment of novel therapies and preventative strategies against carcinogenesis.</p

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Full text link
    Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License
    • …
    corecore